| nombre | descripcion |
|---|---|
| TimePermCG5 | Tiempo permanencia |
| PermanentCG5 | variable de status: 1 tiene la plaza |
| MobilityPG24 | Movilidad |
| Q6 | dicipline |
| Q23 | Sex |
| Q24 | Age |
| Q25 | Single |
| Q27 | Children (ver variable Parenthood tb ) |
| Countryb | Country |
| foreign | Extranjero |
| foreignEdu2 | Si te has educado en un país extranjero |
| multidisc | Mobility disciplinar |
| publ_prod2 | Publications |
Nota sobre survreg: survreg() fits accelerated failure models, not proportional hazards models. The coefficients are logarithms of ratios of survival times, so a positive coefficient means longer survival.
Nota para mi: utilizar función survplot dentro de la librería rms, ver http://rstudio-pubs-static.s3.amazonaws.com/5564_bc9e2d9a458c4660aa82882df90b7a6b.html ## Lectura de datos desde fichero de stata
Con la librería foreign podemos leer datos de spss, stata, etc.
library(foreign)
datos <- read.dta("../data/working_10_Cleaner.dta")
head( names(datos))
## [1] "Career_stage" "CodeNumber_enc" "CodeNumber"
## [4] "Countryb" "Current_Institute" "FieldofScience"
Nos quedamos sólo con las variables que nos hacen falta
datos <- datos[,c("ability","Countryb","Q6","Q23","Q24","Q25","Q27",
"Parenthood", "foreign", "foreignEdu2",
"multidisc", "TimePermCG5","PermanentCG5",
"mobilityPG24")]
La variable Countryb tiene como niveles todos los países, elimino las etiquetas de países que no están en los datos y elijo UK como el país de referencia
datos$Countryb <- droplevels(as.factor(as.character(datos$Countryb)))
datos$Countryb <- relevel(datos$Countryb,ref="united kingdom")
Defino la variable time como TimePermCG y status como PermanentCG
datos$time <- datos$TimePermCG5
datos$status <- datos$PermanentCG5
Comprobamos si hay casos perdidos
sum( is.na( datos$time))
## [1] 36
sum( is.na( datos$status))
## [1] 0
Eliminamos los perdidos en time
datos <- datos[!is.na(datos$time),]
Tenemos observaciones censuradas, es decir, entrevistados que no han obtenido la plaza ni sabemos cuando la obtendrán. Unas 1813
table(datos$status)
##
## 0 1
## 1813 4228
Por otro lado también tenemos entrevistados que en time=0 ya tienen la plaza (status=1), una posible solución es sumar 0.5 a time. Hay 1243 casos así
head(with(datos, table(time, status)),10)
## status
## time 0 1
## 0 69 1243
## 1 87 462
## 2 96 394
## 3 127 354
## 4 162 308
## 5 182 258
## 6 177 250
## 7 180 182
## 8 132 157
## 9 100 107
sumamos 0.001 a time
datos$time <- datos$time +0.001
En la variable de ability tengo 43 valores inferiores a 0, preguntar a Ana sobre esta variable, como está construida y si tiene sentido que sea menor que 0. También hay 1627 casos perdidos en esta variable
table(datos$ability<0, exclude = NULL)
##
## FALSE TRUE <NA>
## 4371 43 1627
Eliminar niveles sobrantes en Q6 (disciplina)
table(datos$Q6)
##
## Same as current position Agricultural Sciences
## 0 111
## Engineering and Technology Humanities
## 1317 792
## Medical Sciences Natural Sciences
## 710 1446
## Social Sciences
## 1623
datos$Q6 <- droplevels(datos$Q6)
table(datos$Q6)
##
## Agricultural Sciences Engineering and Technology
## 111 1317
## Humanities Medical Sciences
## 792 710
## Natural Sciences Social Sciences
## 1446 1623
Utilizamos la librería survival
library(survival)
library(rms)
## Loading required package: Hmisc
## Loading required package: grid
## Loading required package: lattice
## Loading required package: Formula
## Loading required package: ggplot2
##
## Attaching package: 'Hmisc'
##
## The following objects are masked from 'package:base':
##
## format.pval, round.POSIXt, trunc.POSIXt, units
##
## Loading required package: SparseM
##
## Attaching package: 'SparseM'
##
## The following object is masked from 'package:base':
##
## backsolve
El objeto fundamental en análisis de supervivencia es Surv(time,status)
S1 <- with(datos, Surv(time,status))
Un objeto de tipo Surv guarda atributos como que variables se han utilizado, el tipo de censura (right en este caso), etc.
attributes(S1)
## $dim
## [1] 6041 2
##
## $dimnames
## $dimnames[[1]]
## NULL
##
## $dimnames[[2]]
## [1] "time" "status"
##
##
## $type
## [1] "right"
##
## $class
## [1] "Surv"
Si vemos los primeros valores, la cruz puesta después del 12.5 indica que ese individuo está censurado, no tiene plaza en el momento en que se le entrevistó, ni sabemos cuando la obtendrá, pero se puede utilizar para el análisis, ya que sabemos que han pasado 12 ¿años? y todavía no la tiene
S1[1:6]
## [1] 1.001 5.001 14.001+ 12.001+ 2.001 1.001
Si lo vemos en los datos, vemos que el caso 4 no ha obtenido la plaza (status=0)
datos[1:6, c("time","status")]
## time status
## 1 1.001 1
## 2 5.001 1
## 3 14.001 0
## 4 12.001 0
## 5 2.001 1
## 6 1.001 1
Con survfit podemos obtener la curva de supervivencia utilizando el estimador de Kaplan-Meier. Nota (cambiamos por función npsurv de la librería rms que hace lo mismo, pero luego salen gráficos más claros)
survplot(npsurv(S1 ~ 1))
Seleccionando time entre 0 y 20
plot(npsurv(S1 ~ 1,), xlim=c(0,20))
Con survfit también podemos ver si hay diferencias por movilidad.
En primer lugar vemos las medias y medianas de time por movilidad
# todos datos, incluyendo los censurados
with(datos, tapply(time,mobilityPG24, function(x)c(media=mean(x),mediana=median(x))))
## $`0`
## media mediana
## 4.525196 3.001000
##
## $`1`
## media mediana
## 5.687516 5.001000
# solo lso que tienen status=1
with(datos[datos$status==1,], tapply(time,mobilityPG24, function(x)c(media=mean(x),mediana=median(x))))
## $`0`
## media mediana
## 3.768638 2.001000
##
## $`1`
## media mediana
## 4.48063 3.00100
Si se observan diferencias,con mobility=0 hay menos survival, es decir, la gente que no se mueve obtiene la plaza antes. (obtener plaza es el evento modelado, seguir survival significa seguir sin plaza)
Dibujamos las dos curvas de supervivencia
mod1 <- npsurv(S1 ~ mobilityPG24 , datos)
survplot(mod1, ylab=expression(hat(S)(t)))
Para cada time y en cada categoría de mobilityPG24, se ha calculada el valor de survival y el intervalo de confianza. No se aprecian diferencias entre movilidad 0 y movilidad 1
summary(mod1)
## Call: npsurv(formula = S1 ~ mobilityPG24, data = datos)
##
## 115 observations deleted due to missingness
## mobilityPG24=0
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0.001 3575 902 0.7477 0.00726 0.73359 0.7621
## 1.001 2616 311 0.6588 0.00796 0.64339 0.6746
## 2.001 2240 216 0.5953 0.00828 0.57926 0.6117
## 3.001 1959 172 0.5430 0.00846 0.52668 0.5598
## 4.001 1695 164 0.4905 0.00858 0.47394 0.5076
## 5.001 1424 150 0.4388 0.00865 0.42218 0.4561
## 6.001 1175 128 0.3910 0.00868 0.37436 0.4084
## 7.001 956 105 0.3481 0.00868 0.33146 0.3655
## 8.001 757 86 0.3085 0.00868 0.29197 0.3260
## 9.001 610 66 0.2751 0.00866 0.25868 0.2926
## 10.001 505 68 0.2381 0.00858 0.22186 0.2555
## 11.001 395 45 0.2110 0.00850 0.19495 0.2283
## 12.001 312 41 0.1832 0.00841 0.16747 0.2005
## 13.001 251 31 0.1606 0.00830 0.14514 0.1777
## 14.001 204 25 0.1409 0.00816 0.12580 0.1579
## 15.001 166 18 0.1256 0.00803 0.11085 0.1424
## 16.001 135 12 0.1145 0.00794 0.09993 0.1311
## 17.001 115 9 0.1055 0.00786 0.09119 0.1221
## 18.001 95 8 0.0966 0.00780 0.08249 0.1132
## 19.001 80 2 0.0942 0.00779 0.08012 0.1108
## 20.001 77 11 0.0808 0.00766 0.06705 0.0973
## 21.001 61 9 0.0688 0.00749 0.05562 0.0852
## 22.001 52 1 0.0675 0.00746 0.05437 0.0838
## 23.001 50 7 0.0581 0.00722 0.04550 0.0741
## 24.001 42 9 0.0456 0.00676 0.03412 0.0610
## 25.001 30 1 0.0441 0.00671 0.03274 0.0594
## 27.001 26 4 0.0373 0.00648 0.02656 0.0524
## 28.001 20 2 0.0336 0.00634 0.02320 0.0486
## 29.001 16 2 0.0294 0.00621 0.01943 0.0445
## 31.001 12 1 0.0269 0.00615 0.01722 0.0421
## 34.001 7 1 0.0231 0.00636 0.01345 0.0396
## 39.001 3 1 0.0154 0.00758 0.00586 0.0404
##
## mobilityPG24=1
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0.001 2351 341 0.8550 0.00726 0.8408 0.8693
## 1.001 1999 151 0.7904 0.00840 0.7741 0.8070
## 2.001 1829 178 0.7135 0.00936 0.6953 0.7320
## 3.001 1621 182 0.6333 0.01001 0.6140 0.6533
## 4.001 1407 144 0.5685 0.01034 0.5486 0.5892
## 5.001 1211 108 0.5178 0.01051 0.4976 0.5388
## 6.001 1025 122 0.4562 0.01064 0.4358 0.4775
## 7.001 826 77 0.4137 0.01069 0.3932 0.4352
## 8.001 669 71 0.3698 0.01075 0.3493 0.3915
## 9.001 533 41 0.3413 0.01080 0.3208 0.3632
## 10.001 437 41 0.3093 0.01089 0.2887 0.3314
## 11.001 340 28 0.2838 0.01100 0.2631 0.3062
## 12.001 262 33 0.2481 0.01124 0.2270 0.2711
## 13.001 210 24 0.2197 0.01135 0.1986 0.2431
## 14.001 162 15 0.1994 0.01145 0.1782 0.2231
## 15.001 134 13 0.1800 0.01153 0.1588 0.2041
## 16.001 113 5 0.1721 0.01155 0.1509 0.1963
## 17.001 97 12 0.1508 0.01164 0.1296 0.1754
## 18.001 81 5 0.1415 0.01165 0.1204 0.1662
## 19.001 71 6 0.1295 0.01164 0.1086 0.1545
## 20.001 59 5 0.1185 0.01164 0.0978 0.1437
## 21.001 45 5 0.1054 0.01174 0.0847 0.1311
## 22.001 35 3 0.0963 0.01184 0.0757 0.1226
## 23.001 30 3 0.0867 0.01189 0.0663 0.1134
## 25.001 23 2 0.0792 0.01199 0.0588 0.1065
## 26.001 19 1 0.0750 0.01206 0.0547 0.1028
## 31.001 9 1 0.0667 0.01329 0.0451 0.0985
## 34.001 7 1 0.0571 0.01441 0.0349 0.0937
## 37.001 3 1 0.0381 0.01828 0.0149 0.0976
## 43.001 1 1 0.0000 NaN NA NA
Para ver el tiempo mediano hasta obtener la plaza se puede ver
mod1
## Call: npsurv(formula = S1 ~ mobilityPG24, data = datos)
##
## 115 observations deleted due to missingness
## n events median 0.95LCL 0.95UCL
## mobilityPG24=0 3575 2608 4 4 5
## mobilityPG24=1 2351 1620 6 5 6
Conclusión: En principio si hay diferncias según movilidad.
# todos datos, incluyendo los censurados
with(datos, tapply(time,Q6, function(x)c(media=mean(x),mediana=median(x))))
## $`Agricultural Sciences`
## media mediana
## 6.406405 5.001000
##
## $`Engineering and Technology`
## media mediana
## 4.831676 3.001000
##
## $Humanities
## media mediana
## 5.218172 4.001000
##
## $`Medical Sciences`
## media mediana
## 6.072831 5.001000
##
## $`Natural Sciences`
## media mediana
## 5.927694 5.001000
##
## $`Social Sciences`
## media mediana
## 4.287506 3.001000
# solo lso que tienen status=1
with(datos[datos$status==1,], tapply(time,Q6, function(x)c(media=mean(x),mediana=median(x))))
## $`Agricultural Sciences`
## media mediana
## 5.410639 4.001000
##
## $`Engineering and Technology`
## media mediana
## 3.86575 2.00100
##
## $Humanities
## media mediana
## 3.795224 2.001000
##
## $`Medical Sciences`
## media mediana
## 4.898059 4.001000
##
## $`Natural Sciences`
## media mediana
## 4.871999 4.001000
##
## $`Social Sciences`
## media mediana
## 3.054957 1.001000
Dibujamos las curvas de supervivencia
# elijo una paleta de colores
library(RColorBrewer)
colores <- brewer.pal(10,"Set3")
mod2 <- npsurv(S1 ~ Q6 , datos)
survplot(mod2, col = colores[1:6],
ylab = expression(hat(S)(t)),
time.inc = 5,
conf = "none")
summary(mod2)
## Call: npsurv(formula = S1 ~ Q6, data = datos)
##
## 42 observations deleted due to missingness
## Q6=Agricultural Sciences
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0.001 111 20 0.8198 0.0365 0.7514 0.895
## 1.001 91 4 0.7838 0.0391 0.7108 0.864
## 2.001 85 10 0.6916 0.0440 0.6104 0.783
## 3.001 72 5 0.6435 0.0459 0.5596 0.740
## 4.001 66 4 0.6045 0.0471 0.5189 0.704
## 5.001 59 8 0.5226 0.0488 0.4351 0.628
## 6.001 50 4 0.4808 0.0492 0.3934 0.588
## 7.001 43 5 0.4249 0.0494 0.3383 0.534
## 8.001 34 3 0.3874 0.0496 0.3015 0.498
## 9.001 27 6 0.3013 0.0495 0.2184 0.416
## 10.001 21 3 0.2583 0.0482 0.1791 0.372
## 11.001 17 2 0.2279 0.0471 0.1520 0.342
## 12.001 14 1 0.2116 0.0465 0.1376 0.325
## 13.001 12 1 0.1940 0.0458 0.1221 0.308
## 15.001 11 1 0.1763 0.0449 0.1070 0.291
## 16.001 10 1 0.1587 0.0438 0.0924 0.272
## 17.001 9 1 0.1411 0.0423 0.0784 0.254
## 20.001 7 1 0.1209 0.0408 0.0624 0.234
## 22.001 6 1 0.1008 0.0386 0.0475 0.214
## 25.001 5 2 0.0605 0.0320 0.0214 0.171
##
## Q6=Engineering and Technology
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0.001 1317 276 0.7904 0.0112 0.7688 0.8127
## 1.001 1017 125 0.6933 0.0128 0.6687 0.7188
## 2.001 874 91 0.6211 0.0135 0.5952 0.6481
## 3.001 765 84 0.5529 0.0139 0.5263 0.5809
## 4.001 657 68 0.4957 0.0141 0.4688 0.5241
## 5.001 556 56 0.4457 0.0142 0.4188 0.4744
## 6.001 464 45 0.4025 0.0142 0.3757 0.4313
## 7.001 369 41 0.3578 0.0142 0.3310 0.3868
## 8.001 286 28 0.3228 0.0143 0.2959 0.3520
## 9.001 244 18 0.2990 0.0143 0.2722 0.3283
## 10.001 204 19 0.2711 0.0143 0.2444 0.3007
## 11.001 162 16 0.2443 0.0144 0.2177 0.2742
## 12.001 123 16 0.2126 0.0145 0.1859 0.2431
## 13.001 97 17 0.1753 0.0145 0.1490 0.2062
## 14.001 73 2 0.1705 0.0145 0.1443 0.2015
## 15.001 68 5 0.1580 0.0145 0.1320 0.1891
## 16.001 58 1 0.1552 0.0145 0.1293 0.1864
## 17.001 52 5 0.1403 0.0146 0.1145 0.1720
## 18.001 43 5 0.1240 0.0146 0.0985 0.1561
## 19.001 36 2 0.1171 0.0146 0.0918 0.1494
## 20.001 32 4 0.1025 0.0145 0.0777 0.1351
## 21.001 26 3 0.0906 0.0143 0.0665 0.1235
## 23.001 22 4 0.0742 0.0139 0.0514 0.1070
## 24.001 17 2 0.0654 0.0136 0.0436 0.0982
## 25.001 15 1 0.0611 0.0133 0.0398 0.0937
## 27.001 13 2 0.0517 0.0128 0.0318 0.0841
## 31.001 7 1 0.0443 0.0129 0.0250 0.0786
## 34.001 3 1 0.0295 0.0148 0.0110 0.0790
## 37.001 1 1 0.0000 NaN NA NA
##
## Q6=Humanities
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0.001 792 196 0.7525 0.0153 0.7231 0.783
## 1.001 585 66 0.6676 0.0168 0.6355 0.701
## 2.001 507 46 0.6071 0.0175 0.5737 0.642
## 3.001 456 40 0.5538 0.0179 0.5199 0.590
## 4.001 397 27 0.5161 0.0181 0.4819 0.553
## 5.001 345 29 0.4728 0.0182 0.4383 0.510
## 6.001 291 27 0.4289 0.0184 0.3943 0.467
## 7.001 243 25 0.3848 0.0185 0.3501 0.423
## 8.001 203 20 0.3469 0.0185 0.3124 0.385
## 9.001 166 13 0.3197 0.0185 0.2853 0.358
## 10.001 139 11 0.2944 0.0186 0.2601 0.333
## 11.001 123 6 0.2800 0.0186 0.2459 0.319
## 12.001 106 11 0.2510 0.0186 0.2170 0.290
## 13.001 84 9 0.2241 0.0186 0.1904 0.264
## 14.001 68 6 0.2043 0.0187 0.1708 0.244
## 15.001 57 4 0.1900 0.0187 0.1567 0.230
## 16.001 49 5 0.1706 0.0187 0.1376 0.211
## 17.001 40 4 0.1535 0.0187 0.1210 0.195
## 18.001 32 1 0.1487 0.0187 0.1163 0.190
## 19.001 29 1 0.1436 0.0187 0.1112 0.185
## 21.001 26 1 0.1381 0.0188 0.1057 0.180
## 22.001 23 1 0.1321 0.0189 0.0997 0.175
## 23.001 21 1 0.1258 0.0190 0.0935 0.169
## 27.001 13 1 0.1161 0.0199 0.0830 0.162
## 28.001 11 1 0.1056 0.0207 0.0719 0.155
## 39.001 2 1 0.0528 0.0387 0.0125 0.222
## 43.001 1 1 0.0000 NaN NA NA
##
## Q6=Medical Sciences
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0.001 710 112 0.8423 0.0137 0.8159 0.869
## 1.001 588 36 0.7907 0.0153 0.7613 0.821
## 2.001 537 36 0.7377 0.0166 0.7058 0.771
## 3.001 484 41 0.6752 0.0179 0.6411 0.711
## 4.001 425 33 0.6228 0.0187 0.5872 0.660
## 5.001 372 30 0.5725 0.0193 0.5360 0.612
## 6.001 323 42 0.4981 0.0199 0.4606 0.539
## 7.001 269 29 0.4444 0.0201 0.4067 0.486
## 8.001 217 25 0.3932 0.0202 0.3555 0.435
## 9.001 179 17 0.3559 0.0202 0.3183 0.398
## 10.001 151 15 0.3205 0.0202 0.2833 0.363
## 11.001 122 13 0.2864 0.0201 0.2495 0.329
## 12.001 99 8 0.2632 0.0201 0.2266 0.306
## 13.001 89 9 0.2366 0.0199 0.2006 0.279
## 14.001 72 9 0.2070 0.0197 0.1718 0.250
## 15.001 57 7 0.1816 0.0195 0.1471 0.224
## 16.001 45 2 0.1735 0.0195 0.1393 0.216
## 17.001 40 2 0.1649 0.0194 0.1309 0.208
## 18.001 35 1 0.1601 0.0194 0.1262 0.203
## 19.001 29 2 0.1491 0.0196 0.1152 0.193
## 20.001 26 2 0.1376 0.0197 0.1040 0.182
## 21.001 19 2 0.1231 0.0201 0.0894 0.170
## 22.001 17 1 0.1159 0.0202 0.0824 0.163
## 23.001 13 1 0.1070 0.0205 0.0735 0.156
## 24.001 11 1 0.0973 0.0208 0.0639 0.148
##
## Q6=Natural Sciences
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0.001 1446 175 0.8790 0.00858 0.8623 0.8959
## 1.001 1262 74 0.8274 0.00995 0.8082 0.8472
## 2.001 1175 124 0.7401 0.01158 0.7178 0.7632
## 3.001 1039 105 0.6653 0.01250 0.6413 0.6903
## 4.001 916 117 0.5803 0.01315 0.5551 0.6067
## 5.001 778 81 0.5199 0.01338 0.4943 0.5468
## 6.001 656 83 0.4541 0.01350 0.4284 0.4814
## 7.001 529 44 0.4164 0.01352 0.3907 0.4437
## 8.001 437 51 0.3678 0.01355 0.3422 0.3953
## 9.001 346 27 0.3391 0.01357 0.3135 0.3667
## 10.001 284 39 0.2925 0.01360 0.2670 0.3204
## 11.001 207 20 0.2642 0.01368 0.2388 0.2925
## 12.001 160 26 0.2213 0.01381 0.1958 0.2501
## 13.001 121 12 0.1994 0.01381 0.1740 0.2284
## 14.001 97 14 0.1706 0.01380 0.1456 0.1999
## 15.001 77 5 0.1595 0.01376 0.1347 0.1889
## 16.001 67 5 0.1476 0.01372 0.1230 0.1771
## 17.001 58 5 0.1349 0.01367 0.1106 0.1645
## 18.001 49 4 0.1239 0.01362 0.0999 0.1537
## 19.001 45 1 0.1211 0.01359 0.0972 0.1509
## 20.001 42 4 0.1096 0.01346 0.0861 0.1394
## 21.001 33 4 0.0963 0.01337 0.0734 0.1264
## 23.001 25 3 0.0847 0.01333 0.0623 0.1153
## 24.001 20 3 0.0720 0.01319 0.0503 0.1031
## 26.001 15 1 0.0672 0.01316 0.0458 0.0987
## 27.001 12 1 0.0616 0.01320 0.0405 0.0938
## 29.001 9 2 0.0479 0.01336 0.0278 0.0828
## 34.001 5 1 0.0383 0.01370 0.0190 0.0772
##
## Q6=Social Sciences
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 0.001 1623 461 0.7160 0.0112 0.6944 0.7382
## 1.001 1147 150 0.6223 0.0121 0.5991 0.6464
## 2.001 971 82 0.5698 0.0124 0.5461 0.5945
## 3.001 848 74 0.5201 0.0126 0.4960 0.5453
## 4.001 727 58 0.4786 0.0127 0.4543 0.5041
## 5.001 610 51 0.4386 0.0128 0.4142 0.4644
## 6.001 500 46 0.3982 0.0129 0.3736 0.4244
## 7.001 407 38 0.3610 0.0131 0.3363 0.3875
## 8.001 323 28 0.3297 0.0132 0.3049 0.3566
## 9.001 251 26 0.2956 0.0134 0.2704 0.3231
## 10.001 208 22 0.2643 0.0136 0.2390 0.2923
## 11.001 162 15 0.2398 0.0137 0.2144 0.2682
## 12.001 126 12 0.2170 0.0139 0.1914 0.2460
## 13.001 107 6 0.2048 0.0140 0.1792 0.2341
## 14.001 94 9 0.1852 0.0141 0.1596 0.2150
## 15.001 77 9 0.1636 0.0142 0.1380 0.1938
## 16.001 65 3 0.1560 0.0142 0.1306 0.1864
## 17.001 54 3 0.1474 0.0142 0.1219 0.1781
## 18.001 46 2 0.1409 0.0143 0.1155 0.1720
## 19.001 40 2 0.1339 0.0144 0.1084 0.1654
## 20.001 34 5 0.1142 0.0148 0.0886 0.1471
## 21.001 24 4 0.0952 0.0151 0.0698 0.1298
## 22.001 18 1 0.0899 0.0151 0.0646 0.1250
## 23.001 16 1 0.0843 0.0152 0.0592 0.1200
## 24.001 15 3 0.0674 0.0149 0.0437 0.1041
## 31.001 6 1 0.0562 0.0161 0.0320 0.0986
Para ver el tiempo mediano hasta obtener la plaza se puede ver
mod2
## Call: npsurv(formula = S1 ~ Q6, data = datos)
##
## 42 observations deleted due to missingness
## n events median 0.95LCL 0.95UCL
## Q6=Agricultural Sciences 111 83 6 5 8
## Q6=Engineering and Technology 1317 939 4 4 5
## Q6=Humanities 792 554 5 4 6
## Q6=Medical Sciences 710 476 6 6 7
## Q6=Natural Sciences 1446 1031 6 5 6
## Q6=Social Sciences 1623 1112 4 3 5
Si parece que según la disciplina las curvas de supervivencia son distintas, podríamos pensar en ir ajustando modelos de regresión paramétricos
Con survreg podemos construir modelos paramétricos (la otra opción son los semiparamétricos como los de regresión de cox)
Se pueden considerar varias distribuciones para la distribución del riesgo según se incrementa el tiempo (age-specific hazard models). Sólo pongo la exponencial y la weibull
| Distribución | Riesgo (hazard) |
|---|---|
| Exponencial | \(constante = \dfrac{1}{\mu} = \lambda\) |
| Weibull | \(\alpha\lambda\left(\lambda t\right)^{\alpha-1}\) |
La distribución Weibull es la más flexible ya que puede tratar co hazards que se incrementan con el tiempo de forma aceleraca \((\alpha >1)\) o decelerada \((\alpha<1)\)
Veamos en primer lugar el modelo con var indep las disciplinas y dist exponencial. La categoría de referencia de Q6 es Agricultural Sciences
mod3 <- survreg(S1 ~ Q6, data=datos, dist ="exponential")
summary(mod3)
##
## Call:
## survreg(formula = S1 ~ Q6, data = datos, dist = "exponential")
## Value Std. Error z p
## (Intercept) 2.1480 0.110 19.569 2.84e-85
## Q6Engineering and Technology -0.2345 0.115 -2.048 4.06e-02
## Q6Humanities -0.1384 0.118 -1.176 2.39e-01
## Q6Medical Sciences 0.0557 0.119 0.468 6.40e-01
## Q6Natural Sciences -0.0301 0.114 -0.264 7.92e-01
## Q6Social Sciences -0.3142 0.114 -2.761 5.76e-03
##
## Scale fixed at 1
##
## Exponential distribution
## Loglik(model)= -12555.1 Loglik(intercept only)= -12592
## Chisq= 73.93 on 5 degrees of freedom, p= 1.6e-14
## Number of Newton-Raphson Iterations: 7
## n=5999 (42 observations deleted due to missingness)
La tabla con los coeficientes y p-valores, redondeando a 3 decimales
round(summary(mod3)$table,3)
## Value Std. Error z p
## (Intercept) 2.148 0.110 19.569 0.000
## Q6Engineering and Technology -0.234 0.115 -2.048 0.041
## Q6Humanities -0.138 0.118 -1.176 0.239
## Q6Medical Sciences 0.056 0.119 0.468 0.640
## Q6Natural Sciences -0.030 0.114 -0.264 0.792
## Q6Social Sciences -0.314 0.114 -2.761 0.006
Bajo la hipótesis de errores exponenciales, vemos que la supervivencia es significativamente distinta de Agricultural Sciences para la categoría de Social Science.
Antes de considerar si agrupamos categorías, veamos si es mejor considerar una distribución Weibull
# la Weibull es la distribución por defecto en survreg y no hace falta especificarlo
mod4 <- survreg(S1 ~ Q6, data=datos)
summary(mod4)
##
## Call:
## survreg(formula = S1 ~ Q6, data = datos)
## Value Std. Error z p
## (Intercept) 2.119 0.3016 7.026 2.12e-12
## Q6Engineering and Technology -0.236 0.3146 -0.750 4.53e-01
## Q6Humanities -0.192 0.3234 -0.592 5.54e-01
## Q6Medical Sciences 0.292 0.3269 0.894 3.71e-01
## Q6Natural Sciences 0.206 0.3135 0.657 5.11e-01
## Q6Social Sciences -0.388 0.3127 -1.241 2.15e-01
## Log(scale) 1.011 0.0141 71.886 0.00e+00
##
## Scale= 2.75
##
## Weibull distribution
## Loglik(model)= -8725.6 Loglik(intercept only)= -8744.6
## Chisq= 38.06 on 5 degrees of freedom, p= 3.7e-07
## Number of Newton-Raphson Iterations: 7
## n=5999 (42 observations deleted due to missingness)
round(summary(mod4)$table,4)
## Value Std. Error z p
## (Intercept) 2.1192 0.3016 7.0261 0.0000
## Q6Engineering and Technology -0.2359 0.3146 -0.7496 0.4535
## Q6Humanities -0.1916 0.3234 -0.5923 0.5536
## Q6Medical Sciences 0.2922 0.3269 0.8938 0.3714
## Q6Natural Sciences 0.2060 0.3135 0.6572 0.5111
## Q6Social Sciences -0.3880 0.3127 -1.2409 0.2146
## Log(scale) 1.0108 0.0141 71.8863 0.0000
El parámetro de scale=1.11 indica que la pendiente de la función hazard se incrementa con el tiempo. Al igual que antes, las categorías significativamente distintas son Engineering and Technology Humanities y Social Science.
Comparo los dos modelos usando la función anova
anova(mod3, mod4)
## Terms Resid. Df -2*LL Test Df Deviance Pr(>Chi)
## 1 Q6 5993 25110.11 NA NA NA
## 2 Q6 5992 17451.12 = 1 7658.998 0
y es mejor el modelo con la distribución Weibull
Podemos probar a unir todos los niveles salvo Social Sciences
datos$Q6_rec <- datos$Q6
levels(datos$Q6_rec)
## [1] "Agricultural Sciences" "Engineering and Technology"
## [3] "Humanities" "Medical Sciences"
## [5] "Natural Sciences" "Social Sciences"
levels(datos$Q6_rec)[1:5] <- "Grupo1"
levels(datos$Q6_rec)
## [1] "Grupo1" "Social Sciences"
mod5 <- survreg(S1 ~ Q6_rec, data=datos)
anova(mod4,mod5)
## Terms Resid. Df -2*LL Test Df Deviance Pr(>Chi)
## 1 Q6 5992 17451.12 NA NA NA
## 2 Q6_rec 5996 17471.87 = -4 -20.75325 0.0003544068
Al ser el p_valor menor de 0.5 significa que ambos modelos son distintos y no está justificado unir esas categorías.
Probemos en unir Agricultural sciences con Natural Sciences y Medical Scienes. Machacamos mod5
datos$Q6_rec <- datos$Q6
levels(datos$Q6_rec)
## [1] "Agricultural Sciences" "Engineering and Technology"
## [3] "Humanities" "Medical Sciences"
## [5] "Natural Sciences" "Social Sciences"
levels(datos$Q6_rec)[c(1,4,5)] <- "Grupo1"
levels(datos$Q6_rec)
## [1] "Grupo1" "Engineering and Technology"
## [3] "Humanities" "Social Sciences"
mod5 <- survreg(S1 ~ Q6_rec, data=datos)
anova(mod4,mod5)
## Terms Resid. Df -2*LL Test Df Deviance Pr(>Chi)
## 1 Q6 5992 17451.12 NA NA NA
## 2 Q6_rec 5994 17451.99 = -2 -0.8752751 0.6455597
Estos dos modelos si son equivalentes, podemos unir esos niveles
summary(mod5)
##
## Call:
## survreg(formula = S1 ~ Q6_rec, data = datos)
## Value Std. Error z p
## (Intercept) 2.341 0.0693 33.78 3.95e-250
## Q6_recEngineering and Technology -0.458 0.1132 -4.04 5.24e-05
## Q6_recHumanities -0.413 0.1356 -3.05 2.31e-03
## Q6_recSocial Sciences -0.610 0.1075 -5.67 1.42e-08
## Log(scale) 1.011 0.0141 71.89 0.00e+00
##
## Scale= 2.75
##
## Weibull distribution
## Loglik(model)= -8726 Loglik(intercept only)= -8744.6
## Chisq= 37.18 on 3 degrees of freedom, p= 4.2e-08
## Number of Newton-Raphson Iterations: 7
## n=5999 (42 observations deleted due to missingness)
Tomamos como referencia “Social Sciences”, para ver si hay diferencias
datos$Q6_rec2 <- relevel(datos$Q6_rec,ref="Social Sciences")
mod6 <- survreg(S1 ~ Q6_rec2, data=datos)
round(summary(mod6)$table,3)
## Value Std. Error z p
## (Intercept) 1.731 0.082 21.001 0.000
## Q6_rec2Grupo1 0.610 0.108 5.671 0.000
## Q6_rec2Engineering and Technology 0.152 0.122 1.249 0.212
## Q6_rec2Humanities 0.196 0.143 1.375 0.169
## Log(scale) 1.011 0.014 71.888 0.000
Parece que entre Social sciences y Engineering and Technology no hay diferencias. Unamos a ver si se puede simplificar el modelo.
datos$Q6_rec3 <- datos$Q6_rec2
levels(datos$Q6_rec3)
## [1] "Social Sciences" "Grupo1"
## [3] "Engineering and Technology" "Humanities"
levels(datos$Q6_rec3)[c(1,3)] <- "Grupo2"
levels(datos$Q6_rec3)
## [1] "Grupo2" "Grupo1" "Humanities"
mod7 <- survreg(S1 ~ Q6_rec3, data = datos)
anova(mod6, mod7)
## Terms Resid. Df -2*LL Test Df Deviance Pr(>Chi)
## 1 Q6_rec2 5994 17451.99 NA NA NA
## 2 Q6_rec3 5995 17453.55 = -1 -1.562549 0.2112924
round(summary(mod7)$table,3)
## Value Std. Error z p
## (Intercept) 1.802 0.061 29.662 0.000
## Q6_rec3Grupo1 0.539 0.092 5.864 0.000
## Q6_rec3Humanities 0.126 0.132 0.955 0.339
## Log(scale) 1.011 0.014 71.892 0.000
Grupo 1: Agricultural Science, Medical Science, Natural Science Grupo 2: Engineering and Technology, Social Scienc Grupo 3: Humanities
with(datos, table(Q6, Q6_rec3))
## Q6_rec3
## Q6 Grupo2 Grupo1 Humanities
## Agricultural Sciences 0 111 0
## Engineering and Technology 1317 0 0
## Humanities 0 0 792
## Medical Sciences 0 710 0
## Natural Sciences 0 1446 0
## Social Sciences 1623 0 0
La curva de supervivencia empírica es
survplot(npsurv(S1 ~ Q6_rec3, datos), lwd=2, col=colores[c(1,4,5)])
survfit(S1 ~ Q6_rec3, datos)
## Call: survfit(formula = S1 ~ Q6_rec3, data = datos)
##
## 42 observations deleted due to missingness
## n events median 0.95LCL 0.95UCL
## Q6_rec3=Grupo2 2940 2051 4 4 5
## Q6_rec3=Grupo1 2267 1590 6 6 6
## Q6_rec3=Humanities 792 554 5 4 6
mod8 <- survfit(S1 ~ Q6 + mobilityPG24, data = datos)
plot(mod8, col=colores, lty=c(rep(1,2),rep(2,2),rep(3,2),rep(4,2), rep(5,2),
rep(6,2)), lwd=c(rep(2,6),rep(3,6)))
Veamos las medianas calculadas por survfit y sus intervalos de confianza
mod8
## Call: survfit(formula = S1 ~ Q6 + mobilityPG24, data = datos)
##
## 156 observations deleted due to missingness
## n events median 0.95LCL
## Q6=Agricultural Sciences, mobilityPG24=0 68 54 5 3
## Q6=Agricultural Sciences, mobilityPG24=1 39 29 6 5
## Q6=Engineering and Technology, mobilityPG24=0 876 646 4 4
## Q6=Engineering and Technology, mobilityPG24=1 414 293 5 4
## Q6=Humanities, mobilityPG24=0 504 372 4 3
## Q6=Humanities, mobilityPG24=1 264 182 6 5
## Q6=Medical Sciences, mobilityPG24=0 437 302 6 5
## Q6=Medical Sciences, mobilityPG24=1 259 174 7 6
## Q6=Natural Sciences, mobilityPG24=0 548 416 5 4
## Q6=Natural Sciences, mobilityPG24=1 885 615 6 6
## Q6=Social Sciences, mobilityPG24=0 1115 798 4 3
## Q6=Social Sciences, mobilityPG24=1 476 314 4 3
## 0.95UCL
## Q6=Agricultural Sciences, mobilityPG24=0 8
## Q6=Agricultural Sciences, mobilityPG24=1 10
## Q6=Engineering and Technology, mobilityPG24=0 5
## Q6=Engineering and Technology, mobilityPG24=1 6
## Q6=Humanities, mobilityPG24=0 5
## Q6=Humanities, mobilityPG24=1 7
## Q6=Medical Sciences, mobilityPG24=0 7
## Q6=Medical Sciences, mobilityPG24=1 8
## Q6=Natural Sciences, mobilityPG24=0 6
## Q6=Natural Sciences, mobilityPG24=1 7
## Q6=Social Sciences, mobilityPG24=0 4
## Q6=Social Sciences, mobilityPG24=1 6
Aquí hay algo interesante, el tiempo mediano en obtener la plaza es un año mayor para los que se movieron que para los que no, en todas las disciplinas, excepto para los de Social Sciences, cuya mediana es la misma. Si vemos las curvas para los de Social Sciences, vemos que la supervivencia es sistemáticamente mayor para los que se mueven qeu para los que no, es decir, si te mueves obtienes más tarde la plaza, aunque en Social Sciences esto es menos acusado.
Nota: Añado Countryb, considero la triple interacción y uso step para ver que interacción es significativa
mod9 <- survreg(S1 ~ Q6 * mobilityPG24 * Countryb , datos)
mod9.step <- step(mod9)
## Warning in nobs.default(object, use.fallback = TRUE): no 'nobs' method is
## available
## Start: AIC=17147.44
## S1 ~ Q6 * mobilityPG24 * Countryb
## Warning in nobs.default(object, use.fallback = TRUE): no 'nobs' method is
## available
## Warning in nobs.default(nfit, use.fallback = TRUE): no 'nobs' method is
## available
## Df AIC
## - Q6:mobilityPG24:Countryb 45 17101
## <none> 17147
## Warning in nobs.default(fit, use.fallback = TRUE): no 'nobs' method is
## available
##
## Step: AIC=17101.38
## S1 ~ Q6 + mobilityPG24 + Countryb + Q6:mobilityPG24 + Q6:Countryb +
## mobilityPG24:Countryb
## Warning in nobs.default(object, use.fallback = TRUE): no 'nobs' method is
## available
## Warning in nobs.default(nfit, use.fallback = TRUE): no 'nobs' method is
## available
## Warning in nobs.default(nfit, use.fallback = TRUE): no 'nobs' method is
## available
## Warning in nobs.default(nfit, use.fallback = TRUE): no 'nobs' method is
## available
## Df AIC
## - Q6:Countryb 45 17060
## - Q6:mobilityPG24 5 17093
## <none> 17101
## - mobilityPG24:Countryb 9 17118
## Warning in nobs.default(fit, use.fallback = TRUE): no 'nobs' method is
## available
##
## Step: AIC=17059.88
## S1 ~ Q6 + mobilityPG24 + Countryb + Q6:mobilityPG24 + mobilityPG24:Countryb
## Warning in nobs.default(object, use.fallback = TRUE): no 'nobs' method is
## available
## Warning in nobs.default(nfit, use.fallback = TRUE): no 'nobs' method is
## available
## Warning in nobs.default(nfit, use.fallback = TRUE): no 'nobs' method is
## available
## Df AIC
## - Q6:mobilityPG24 5 17051
## <none> 17060
## - mobilityPG24:Countryb 9 17076
## Warning in nobs.default(fit, use.fallback = TRUE): no 'nobs' method is
## available
##
## Step: AIC=17051.37
## S1 ~ Q6 + mobilityPG24 + Countryb + mobilityPG24:Countryb
## Warning in nobs.default(object, use.fallback = TRUE): no 'nobs' method is
## available
## Warning in nobs.default(nfit, use.fallback = TRUE): no 'nobs' method is
## available
## Warning in nobs.default(nfit, use.fallback = TRUE): no 'nobs' method is
## available
## Df AIC
## <none> 17051
## - mobilityPG24:Countryb 9 17068
## - Q6 5 17265
El procedimiento por pasos nos dice que la interacción significativa es la de movilidad y país. Esto va en línea con lo que pensábamos, la movilidad influye sobre los tiempos hasta conseguir la plaza de manera diferente en cada país.. 7
summary(mod9.step)
##
## Call:
## survreg(formula = S1 ~ Q6 + mobilityPG24 + Countryb + mobilityPG24:Countryb,
## data = datos)
## Value Std. Error z p
## (Intercept) 1.2926 0.323 4.000 6.34e-05
## Q6Engineering and Technology -0.1010 0.312 -0.324 7.46e-01
## Q6Humanities -0.1546 0.321 -0.481 6.30e-01
## Q6Medical Sciences 0.4314 0.325 1.328 1.84e-01
## Q6Natural Sciences 0.2309 0.312 0.741 4.59e-01
## Q6Social Sciences -0.2019 0.311 -0.648 5.17e-01
## mobilityPG24 0.7523 0.182 4.144 3.41e-05
## Countrybbelgium 0.9766 0.373 2.617 8.88e-03
## Countrybfrance 0.0681 0.193 0.353 7.24e-01
## Countrybgermany 1.3793 0.218 6.332 2.43e-10
## Countrybitaly 0.5620 0.165 3.416 6.35e-04
## Countrybnetherlands 0.1199 0.224 0.534 5.93e-01
## Countrybpoland 0.4001 0.352 1.136 2.56e-01
## Countrybspain -0.4016 0.197 -2.039 4.15e-02
## Countrybsweden 0.8114 0.195 4.167 3.09e-05
## Countrybswitzerland 1.0635 0.471 2.258 2.40e-02
## mobilityPG24:Countrybbelgium 0.3110 0.559 0.556 5.78e-01
## mobilityPG24:Countrybfrance -0.0699 0.334 -0.209 8.34e-01
## mobilityPG24:Countrybgermany -0.7901 0.337 -2.346 1.90e-02
## mobilityPG24:Countrybitaly -1.0698 0.282 -3.793 1.49e-04
## mobilityPG24:Countrybnetherlands 0.0623 0.344 0.181 8.56e-01
## mobilityPG24:Countrybpoland -0.7943 0.604 -1.316 1.88e-01
## mobilityPG24:Countrybspain 0.6661 0.328 2.029 4.25e-02
## mobilityPG24:Countrybsweden -0.2841 0.305 -0.931 3.52e-01
## mobilityPG24:Countrybswitzerland -0.5108 0.536 -0.953 3.41e-01
## Log(scale) 0.9924 0.014 70.859 0.00e+00
##
## Scale= 2.7
##
## Weibull distribution
## Loglik(model)= -8499.7 Loglik(intercept only)= -8605.2
## Chisq= 210.96 on 24 degrees of freedom, p= 0
## Number of Newton-Raphson Iterations: 7
## n=5885 (156 observations deleted due to missingness)
round(summary(mod9.step)$table, 3)
## Value Std. Error z p
## (Intercept) 1.293 0.323 4.000 0.000
## Q6Engineering and Technology -0.101 0.312 -0.324 0.746
## Q6Humanities -0.155 0.321 -0.481 0.630
## Q6Medical Sciences 0.431 0.325 1.328 0.184
## Q6Natural Sciences 0.231 0.312 0.741 0.459
## Q6Social Sciences -0.202 0.311 -0.648 0.517
## mobilityPG24 0.752 0.182 4.144 0.000
## Countrybbelgium 0.977 0.373 2.617 0.009
## Countrybfrance 0.068 0.193 0.353 0.724
## Countrybgermany 1.379 0.218 6.332 0.000
## Countrybitaly 0.562 0.165 3.416 0.001
## Countrybnetherlands 0.120 0.224 0.534 0.593
## Countrybpoland 0.400 0.352 1.136 0.256
## Countrybspain -0.402 0.197 -2.039 0.041
## Countrybsweden 0.811 0.195 4.167 0.000
## Countrybswitzerland 1.064 0.471 2.258 0.024
## mobilityPG24:Countrybbelgium 0.311 0.559 0.556 0.578
## mobilityPG24:Countrybfrance -0.070 0.334 -0.209 0.834
## mobilityPG24:Countrybgermany -0.790 0.337 -2.346 0.019
## mobilityPG24:Countrybitaly -1.070 0.282 -3.793 0.000
## mobilityPG24:Countrybnetherlands 0.062 0.344 0.181 0.856
## mobilityPG24:Countrybpoland -0.794 0.604 -1.316 0.188
## mobilityPG24:Countrybspain 0.666 0.328 2.029 0.042
## mobilityPG24:Countrybsweden -0.284 0.305 -0.931 0.352
## mobilityPG24:Countrybswitzerland -0.511 0.536 -0.953 0.341
## Log(scale) 0.992 0.014 70.859 0.000
Las categorías de referencia en cada variable son Social Sciences para Q6, 0 para mobilityPG24 y United Kingdom para Countryb.. Lo podemos cambiar según que nos interese tener en cada categoría
Interpretaciones:
Un coeficiente positivo significa más supervivencia que la categoría de referencia.
S(t, x= United kingdom, movilidad 1 y Ciencias sociales) = exp{-exp(-1.293)^2.7 t ^2.7}
Ejemplo con España.
El coeficiente para Spain es -0.4 lo que implica que hay menos superviencia en España que en el Uk ( es decir, en España se consigue la plaza antes). Ahora bien, el coeficiente en España para los que se mueven es 0.66, lo que indica que hay mayor supervivencia para los que se mueven que para los que no.. Así, por un lado, se tiene que en España se obtiene la plaza antes, pero que moverse en España tiene un efecto negativo, (el que se fue a sevilla), comparado con el resto de países. Más aún, de los coeficiente positivos de las interacciones entre país y movilidad (en Bélgica, Francia y España), es el único con p_valor < 0.05.
Opinión personal mía . Con esta definición de permanent y movilidad España no sale bien parada, en general se tarda menos en obtener plaza. pero la movilidad influye muy negativamente.
# modelo 9 por kaplan meier
res.km <-
## Define a function to plot survreg prediction by gender
survreg.curves <- function(model, col = "black",
toPredict,
seq.quantiles = seq(from = 0.00, to = 0.97, by = 0.01),...) {
x = predict(model, newdata=toPredict,
type="quantile",
p = seq.quantiles)
rownames(x) <- paste0("mobility = ",toPredict$mobilityPG24, " ,Discipline = ", toPredict$Q6, ", Country = ", toPredict$Countryb )
y = 1-seq.quantiles
plot(x[1,],y,type="n", xlab="time", ylab="Survival", axes=F,xlim=c(0,20),...)
axis(1)
axis(2, at=c(0,0.25, 0.5, 0.75, 1))
plotlines <- function(t){
lines(x=t, y=y,...)
}
apply(x,1, plotlines)
}
# seleccionamos de los datos, los que sean Social Sciences, movilidad 1 y
# que pertenezcan a Uk y a Spain
dat1 <- data.frame(Q6 = "Social Sciences", mobilityPG24 = c(0,1),
Countryb = "spain" )
dat1
## Q6 mobilityPG24 Countryb
## 1 Social Sciences 0 spain
## 2 Social Sciences 1 spain
survreg.curves(mod9.step, toPredict = dat1, main="mobility Yes vs No\n in Social Sciences in Spain")
## NULL
predict(mod9.step, newdata=dat1, type="response")
## 1 2
## 1.991886 8.227654
#
dat2 <- data.frame(Q6 = "Social Sciences", mobilityPG24 = c(0,1),
Countryb = "united kingdom" )
dat2
## Q6 mobilityPG24 Countryb
## 1 Social Sciences 0 united kingdom
## 2 Social Sciences 1 united kingdom
survreg.curves(mod9.step, toPredict = dat2, main="mobility Yes vs No\n in Social Sciences in United kingdom")
## NULL
predict(mod9.step, newdata=dat2, type="response")
## 1 2
## 2.976423 6.315820
dat3 <- data.frame(Q6 = c("Social Sciences"),
mobilityPG24 = c(0,1),
Countryb = "italy" )
dat3
## Q6 mobilityPG24 Countryb
## 1 Social Sciences 0 italy
## 2 Social Sciences 1 italy
predict(mod9.step, newdata=dat3, type="response")
## 1 2
## 5.221099 3.801065
survreg.curves(mod9.step, toPredict=dat3)
## NULL
Otra forma de pintarlo. Con la definición matemática
Comparamos movilidad frente no movilidad para los de agricultural sciences en reino unido. Como no hay interacción entre disciplina y movilididad, la separación entre curvas será la misma para las otras disciplinas en el Reino Unido.
# Curva supervivencia estimada para la cat de referencia (Uk, agricultural sciences, no movilidad)
curve(exp(-(exp(-mod9.step$coef[1]) * x)^(1/mod9.step$scale)),from=0, to=20,
col="red", main="Survival curve for Uk, \nin agricultural sciences discipline", xlab="Time", ylab = "Survival estimate")
# Comparamos los mismos con movilidad
curve(exp(-(exp(-mod9.step$coef[1]-mod9.step$coef[7]) * x)^(1/mod9.step$scale)),
from=0, to = 20,
col="blue",add=TRUE)
legend("topright",c("No mobility", " Mobility"), col=c("red","blue"),lwd=2)
# Curva supervivencia estimada para la cat Spain, Social sciences, no movilidad)
curve(exp(-(exp(-mod9.step$coef[1] - mod9.step$coef[6]- mod9.step$coef[14]) * x)^(1/mod9.step$scale)),from=0, to=20,
col="red", main="Survival curve for Spain, \nin Social sciences discipline", xlab="Time", ylab = "Survival estimate")
# Comparamos los mismos con movilidad
curve(exp(-(exp(-mod9.step$coef[1] - mod9.step$coef[6]- mod9.step$coef[14]-mod9.step$coef[7]-mod9.step$coef[23]) * x)^(1/mod9.step$scale)),
from=0, to = 20,
col="blue",add=TRUE)
legend("topright",c("No mobility", " Mobility"), col=c("red","blue"),lwd=2)
En Italia, las curvas de supervivencia son más parecidas, pero se invierte la relación, los que no se mueven tienen mayor supervivencia, es decir, obtienen la plaza después de los que se mueven.
Si nos fijamos en los coeficientes el de movilidad es de 0.75 y el de la interacción entre Italia y movilidad es de -1.07, con lo que en Italia el coef de movilidad es negativo e implica menos supervivencia (mayor probabilidad de tener la plaza antes) para los que se mueven. Esto también pasa en Alemania y Polonia , ambas con -0.79, pero en estos casos hacen que las curvas sean casi iguales para los que se mueven que para los que no. En el otro extremo están países como España o Bélgica.
# Curva supervivencia estimada para la cat Italia, Social sciences, no movilidad)
curve(exp(-(exp(-mod9.step$coef[1] - mod9.step$coef[6]- mod9.step$coef[11]) * x)^(1/mod9.step$scale)),from=0, to=20,
col="red", main="Survival curve for Italy, \nin Social sciences discipline", xlab="Time", ylab = "Survival estimate")
# Comparamos los mismos con movilidad
curve(exp(-(exp(-mod9.step$coef[1] - mod9.step$coef[6]- mod9.step$coef[11]-mod9.step$coef[7]-mod9.step$coef[20]) * x)^(1/mod9.step$scale)),
from=0, to = 20,
col="blue",add=TRUE)
legend("topright",c("No mobility", " Mobility"), col=c("red","blue"),lwd=2)
subdatos <- datos[ datos$Countryb=="italy" & datos$Q6=="Social Sciences",]
S2 <- with(subdatos, Surv(time,status))
plot(survfit(S2 ~ mobilityPG24, subdatos))
curve(exp(-(exp(-mod9.step$coef[1] - mod9.step$coef[6]- mod9.step$coef[11]) * x)^(1/mod9.step$scale)),from=0, to=20,
col="red", add=TRUE)
# Comparamos los mismos con movilidad
curve(exp(-(exp(-mod9.step$coef[1] - mod9.step$coef[6]- mod9.step$coef[11]-mod9.step$coef[7]-mod9.step$coef[20]) * x)^(1/mod9.step$scale)),
from=0, to = 20,
col="blue",add=TRUE)
legend("topright",c("No mobility", " Mobility"), col=c("red","blue"),lwd=2)
En españa
subdatos <- datos[ datos$Countryb=="spain" & datos$Q6=="Social Sciences",]
S2 <- with(subdatos, Surv(time,status))
plot(survfit(S2 ~ mobilityPG24, subdatos))
curve(exp(-(exp(-mod9.step$coef[1] - mod9.step$coef[6]- mod9.step$coef[14]) * x)^(1/mod9.step$scale)),from=0, to=20,
col="red", add=TRUE)
# Comparamos los mismos con movilidad
curve(exp(-(exp(-mod9.step$coef[1] - mod9.step$coef[6]- mod9.step$coef[14]-mod9.step$coef[7]-mod9.step$coef[23]) * x)^(1/mod9.step$scale)),
from=0, to = 20,
col="blue",add=TRUE)
legend("topright",c("No mobility", " Mobility"), col=c("red","blue"),lwd=2)
uk
subdatos <- datos[ datos$Countryb=="united kingdom" & datos$Q6=="Social Sciences",]
S2 <- with(subdatos, Surv(time,status))
plot(survfit(S2 ~ mobilityPG24, subdatos))
Partimos del último modelo
mod.cph <- coxph(S1 ~ Q6 + mobilityPG24 + Countryb + mobilityPG24:Countryb,
data = datos)
summary(mod.cph)
## Call:
## coxph(formula = S1 ~ Q6 + mobilityPG24 + Countryb + mobilityPG24:Countryb,
## data = datos)
##
## n= 5885, number of events= 4195
## (156 observations deleted due to missingness)
##
## coef exp(coef) se(coef) z
## Q6Engineering and Technology 0.05907 1.06085 0.11564 0.511
## Q6Humanities 0.03976 1.04057 0.11925 0.333
## Q6Medical Sciences -0.19196 0.82534 0.12045 -1.594
## Q6Natural Sciences -0.07344 0.92919 0.11552 -0.636
## Q6Social Sciences 0.08955 1.09368 0.11564 0.774
## mobilityPG24 -0.25287 0.77657 0.06716 -3.765
## Countrybbelgium -0.36824 0.69195 0.13828 -2.663
## Countrybfrance 0.10604 1.11187 0.07154 1.482
## Countrybgermany -0.54527 0.57968 0.08068 -6.758
## Countrybitaly -0.20681 0.81318 0.06096 -3.392
## Countrybnetherlands -0.09036 0.91360 0.08330 -1.085
## Countrybpoland -0.22346 0.79974 0.13063 -1.711
## Countrybspain 0.17865 1.19561 0.07318 2.441
## Countrybsweden -0.29893 0.74161 0.07217 -4.142
## Countrybswitzerland -0.41366 0.66122 0.17463 -2.369
## mobilityPG24:Countrybbelgium -0.13287 0.87557 0.20732 -0.641
## mobilityPG24:Countrybfrance -0.09170 0.91238 0.12382 -0.741
## mobilityPG24:Countrybgermany 0.23511 1.26505 0.12484 1.883
## mobilityPG24:Countrybitaly 0.33619 1.39960 0.10462 3.213
## mobilityPG24:Countrybnetherlands 0.01108 1.01114 0.12763 0.087
## mobilityPG24:Countrybpoland 0.25837 1.29481 0.22391 1.154
## mobilityPG24:Countrybspain -0.23049 0.79415 0.12172 -1.894
## mobilityPG24:Countrybsweden 0.07204 1.07470 0.11326 0.636
## mobilityPG24:Countrybswitzerland 0.15504 1.16771 0.19872 0.780
## Pr(>|z|)
## Q6Engineering and Technology 0.609491
## Q6Humanities 0.738785
## Q6Medical Sciences 0.111006
## Q6Natural Sciences 0.524933
## Q6Social Sciences 0.438692
## mobilityPG24 0.000166 ***
## Countrybbelgium 0.007744 **
## Countrybfrance 0.138276
## Countrybgermany 1.40e-11 ***
## Countrybitaly 0.000693 ***
## Countrybnetherlands 0.278042
## Countrybpoland 0.087152 .
## Countrybspain 0.014640 *
## Countrybsweden 3.44e-05 ***
## Countrybswitzerland 0.017848 *
## mobilityPG24:Countrybbelgium 0.521569
## mobilityPG24:Countrybfrance 0.458928
## mobilityPG24:Countrybgermany 0.059665 .
## mobilityPG24:Countrybitaly 0.001312 **
## mobilityPG24:Countrybnetherlands 0.930847
## mobilityPG24:Countrybpoland 0.248545
## mobilityPG24:Countrybspain 0.058283 .
## mobilityPG24:Countrybsweden 0.524725
## mobilityPG24:Countrybswitzerland 0.435283
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## Q6Engineering and Technology 1.0608 0.9426 0.8457 1.3307
## Q6Humanities 1.0406 0.9610 0.8237 1.3145
## Q6Medical Sciences 0.8253 1.2116 0.6518 1.0451
## Q6Natural Sciences 0.9292 1.0762 0.7409 1.1653
## Q6Social Sciences 1.0937 0.9143 0.8719 1.3719
## mobilityPG24 0.7766 1.2877 0.6808 0.8858
## Countrybbelgium 0.6920 1.4452 0.5277 0.9074
## Countrybfrance 1.1119 0.8994 0.9664 1.2792
## Countrybgermany 0.5797 1.7251 0.4949 0.6790
## Countrybitaly 0.8132 1.2297 0.7216 0.9164
## Countrybnetherlands 0.9136 1.0946 0.7760 1.0756
## Countrybpoland 0.7997 1.2504 0.6191 1.0331
## Countrybspain 1.1956 0.8364 1.0358 1.3800
## Countrybsweden 0.7416 1.3484 0.6438 0.8543
## Countrybswitzerland 0.6612 1.5123 0.4696 0.9311
## mobilityPG24:Countrybbelgium 0.8756 1.1421 0.5832 1.3145
## mobilityPG24:Countrybfrance 0.9124 1.0960 0.7158 1.1630
## mobilityPG24:Countrybgermany 1.2651 0.7905 0.9905 1.6158
## mobilityPG24:Countrybitaly 1.3996 0.7145 1.1401 1.7182
## mobilityPG24:Countrybnetherlands 1.0111 0.9890 0.7874 1.2985
## mobilityPG24:Countrybpoland 1.2948 0.7723 0.8349 2.0082
## mobilityPG24:Countrybspain 0.7941 1.2592 0.6256 1.0081
## mobilityPG24:Countrybsweden 1.0747 0.9305 0.8608 1.3418
## mobilityPG24:Countrybswitzerland 1.1677 0.8564 0.7910 1.7238
##
## Concordance= 0.6 (se = 0.006 )
## Rsquare= 0.041 (max possible= 1 )
## Likelihood ratio test= 247.2 on 24 df, p=0
## Wald test = 246.9 on 24 df, p=0
## Score (logrank) test = 252.1 on 24 df, p=0
La categoría de referencia es Q6= Agricultural Sciences, mobility=0, Countryb= UK
Predicción del riesgo para los que se mueven o no en Italia cuya disciplina sea Social Sciencies, ese riesgo siempre es con respecto a la categoría de referencia
dat3
## Q6 mobilityPG24 Countryb
## 1 Social Sciences 0 italy
## 2 Social Sciences 1 italy
predict(mod.cph, newdata=dat3, type="risk", se.fit=TRUE)
## $fit
## 1 2
## 1.104668 1.200650
##
## $se.fit
## 1 2
## 0.05164945 0.08035408
Predicción del riesgo para los que se mueven o no en España cuya disciplina sea Social Sciencies
dat1
## Q6 mobilityPG24 Countryb
## 1 Social Sciences 0 spain
## 2 Social Sciences 1 spain
predict(mod.cph, newdata=dat1, type="risk", se.fit=TRUE)
## $fit
## 1 2
## 1.624184 1.001650
##
## $se.fit
## 1 2
## 0.08111035 0.08951564
Considerando estratos los países, esto lo que hace es que considera que hay una función de supervivencia base distinta en cada país, pero que la relación entre las covariables es la misma en cada país
mod.cph1 <- coxph(S1 ~ Q6 + mobilityPG24 + strata(Countryb),
data = datos, method="breslow")
summary(mod.cph1)
## Call:
## coxph(formula = S1 ~ Q6 + mobilityPG24 + strata(Countryb), data = datos,
## method = "breslow")
##
## n= 5885, number of events= 4195
## (156 observations deleted due to missingness)
##
## coef exp(coef) se(coef) z Pr(>|z|)
## Q6Engineering and Technology 0.03306 1.03361 0.11562 0.286 0.775
## Q6Humanities 0.01006 1.01011 0.11945 0.084 0.933
## Q6Medical Sciences -0.18403 0.83191 0.12055 -1.527 0.127
## Q6Natural Sciences -0.07875 0.92427 0.11550 -0.682 0.495
## Q6Social Sciences 0.06076 1.06264 0.11575 0.525 0.600
## mobilityPG24 -0.16693 0.84626 0.03399 -4.911 9.08e-07
##
## Q6Engineering and Technology
## Q6Humanities
## Q6Medical Sciences
## Q6Natural Sciences
## Q6Social Sciences
## mobilityPG24 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## Q6Engineering and Technology 1.0336 0.9675 0.8240 1.2965
## Q6Humanities 1.0101 0.9900 0.7993 1.2766
## Q6Medical Sciences 0.8319 1.2021 0.6568 1.0536
## Q6Natural Sciences 0.9243 1.0819 0.7370 1.1591
## Q6Social Sciences 1.0626 0.9411 0.8470 1.3333
## mobilityPG24 0.8463 1.1817 0.7917 0.9046
##
## Concordance= 0.562 (se = 0.016 )
## Rsquare= 0.01 (max possible= 1 )
## Likelihood ratio test= 59.34 on 6 df, p=6.115e-11
## Wald test = 58.44 on 6 df, p=9.337e-11
## Score (logrank) test = 58.62 on 6 df, p=8.592e-11
predict(mod.cph1, newdata=dat1, type="risk", se.fit=TRUE)
## $fit
## 1 2
## 1.1484146 0.9718541
##
## $se.fit
## 1 2
## 0.03485187 0.04126747
npsurv(mod.cph1)
## Call: npsurv(formula = mod.cph1)
##
## n events median 0.95LCL 0.95UCL
## united kingdom 1291 937 5 4 5
## belgium 199 105 8 6 10
## france 564 440 3 3 4
## germany 576 370 8 7 9
## italy 905 765 5 5 6
## netherlands 487 346 5 4 6
## poland 150 99 6 3 12
## spain 650 443 4 3 5
## sweden 767 500 6 6 7
## switzerland 296 190 7 6 8
survplot(npsurv(mod.cph1), levels.only=TRUE, conf="none", lty=1:10, n.risk=TRUE)
comprobación asunciones , comprobar también para modelo con país como estrato
ph_fit_ps <-cox.zph(mod.cph)
round(ph_fit_ps$table,3)
## rho chisq p
## Q6Engineering and Technology -0.027 3.119 0.077
## Q6Humanities -0.048 9.891 0.002
## Q6Medical Sciences -0.008 0.246 0.620
## Q6Natural Sciences 0.003 0.027 0.870
## Q6Social Sciences -0.050 10.731 0.001
## mobilityPG24 0.050 10.336 0.001
## Countrybbelgium 0.027 3.101 0.078
## Countrybfrance 0.056 13.345 0.000
## Countrybgermany 0.090 33.904 0.000
## Countrybitaly 0.115 54.937 0.000
## Countrybnetherlands -0.017 1.231 0.267
## Countrybpoland -0.044 8.065 0.005
## Countrybspain -0.026 2.875 0.090
## Countrybsweden 0.040 6.679 0.010
## Countrybswitzerland 0.013 0.717 0.397
## mobilityPG24:Countrybbelgium -0.018 1.408 0.235
## mobilityPG24:Countrybfrance -0.010 0.396 0.529
## mobilityPG24:Countrybgermany -0.037 5.670 0.017
## mobilityPG24:Countrybitaly -0.078 25.655 0.000
## mobilityPG24:Countrybnetherlands 0.007 0.196 0.658
## mobilityPG24:Countrybpoland 0.003 0.040 0.841
## mobilityPG24:Countrybspain 0.019 1.437 0.231
## mobilityPG24:Countrybsweden -0.028 3.189 0.074
## mobilityPG24:Countrybswitzerland -0.001 0.002 0.962
## GLOBAL NA 265.937 0.000
plot(ph_fit_ps)
abline(h=0, lty=3)
mod.cph.test <- coxph(S1 ~ Q6 + mobilityPG24 + Countryb + mobilityPG24:Countryb,
data = datos)
ph_fit <- cox.zph(mod.cph.test, transform="rank", global=FALSE)
round(ph_fit$table,3)
## rho chisq p
## Q6Engineering and Technology -0.026 2.823 0.093
## Q6Humanities -0.047 9.256 0.002
## Q6Medical Sciences -0.005 0.097 0.755
## Q6Natural Sciences 0.005 0.121 0.728
## Q6Social Sciences -0.049 10.294 0.001
## mobilityPG24 0.050 10.488 0.001
## Countrybbelgium 0.027 3.079 0.079
## Countrybfrance 0.052 11.593 0.001
## Countrybgermany 0.091 34.565 0.000
## Countrybitaly 0.116 55.819 0.000
## Countrybnetherlands -0.018 1.416 0.234
## Countrybpoland -0.046 8.901 0.003
## Countrybspain -0.028 3.229 0.072
## Countrybsweden 0.040 6.603 0.010
## Countrybswitzerland 0.015 0.958 0.328
## mobilityPG24:Countrybbelgium -0.017 1.277 0.258
## mobilityPG24:Countrybfrance -0.009 0.324 0.569
## mobilityPG24:Countrybgermany -0.038 5.974 0.015
## mobilityPG24:Countrybitaly -0.075 23.826 0.000
## mobilityPG24:Countrybnetherlands 0.008 0.247 0.619
## mobilityPG24:Countrybpoland 0.001 0.003 0.955
## mobilityPG24:Countrybspain 0.020 1.666 0.197
## mobilityPG24:Countrybsweden -0.025 2.688 0.101
## mobilityPG24:Countrybswitzerland -0.003 0.027 0.870
Como no se cumple la hipótesis de PH, o bien se estratifica (que no nos interesa) o se introduce que alguna covariable dependa también del tiempo. o se parte la función de supervivencia en trozos (investigar)
datos.bis <- survSplit(datos, cut=c(2,5), end = "time", event = "status", start="start")
datos.bis$gt[datos.bis$start<=2] <- 1
datos.bis$gt[datos.bis$start>2 & datos.bis$start<=5 ] <- 2
datos.bis$gt[datos.bis$start>5 ] <- 3
datos.bis$gt <- factor(datos.bis$gt)
# datos.bis$gt[datos.bis$start>10] <- 4
# vemos que coincide si ajusto datos.bis con start coincide
coxph(Surv(start,time,status)~ Q6 + mobilityPG24 + Countryb + mobilityPG24:Countryb,data=datos.bis)
## Call:
## coxph(formula = Surv(start, time, status) ~ Q6 + mobilityPG24 +
## Countryb + mobilityPG24:Countryb, data = datos.bis)
##
##
## coef exp(coef) se(coef) z p
## Q6Engineering and Technology 0.0591 1.0608 0.1156 0.51 0.60949
## Q6Humanities 0.0398 1.0406 0.1192 0.33 0.73878
## Q6Medical Sciences -0.1920 0.8253 0.1205 -1.59 0.11101
## Q6Natural Sciences -0.0734 0.9292 0.1155 -0.64 0.52493
## Q6Social Sciences 0.0896 1.0937 0.1156 0.77 0.43869
## mobilityPG24 -0.2529 0.7766 0.0672 -3.77 0.00017
## Countrybbelgium -0.3682 0.6920 0.1383 -2.66 0.00774
## Countrybfrance 0.1060 1.1119 0.0715 1.48 0.13828
## Countrybgermany -0.5453 0.5797 0.0807 -6.76 1.4e-11
## Countrybitaly -0.2068 0.8132 0.0610 -3.39 0.00069
## Countrybnetherlands -0.0904 0.9136 0.0833 -1.08 0.27804
## Countrybpoland -0.2235 0.7997 0.1306 -1.71 0.08715
## Countrybspain 0.1787 1.1956 0.0732 2.44 0.01464
## Countrybsweden -0.2989 0.7416 0.0722 -4.14 3.4e-05
## Countrybswitzerland -0.4137 0.6612 0.1746 -2.37 0.01785
## mobilityPG24:Countrybbelgium -0.1329 0.8756 0.2073 -0.64 0.52157
## mobilityPG24:Countrybfrance -0.0917 0.9124 0.1238 -0.74 0.45893
## mobilityPG24:Countrybgermany 0.2351 1.2651 0.1248 1.88 0.05967
## mobilityPG24:Countrybitaly 0.3362 1.3996 0.1046 3.21 0.00131
## mobilityPG24:Countrybnetherlands 0.0111 1.0111 0.1276 0.09 0.93085
## mobilityPG24:Countrybpoland 0.2584 1.2948 0.2239 1.15 0.24854
## mobilityPG24:Countrybspain -0.2305 0.7941 0.1217 -1.89 0.05828
## mobilityPG24:Countrybsweden 0.0720 1.0747 0.1133 0.64 0.52473
## mobilityPG24:Countrybswitzerland 0.1550 1.1677 0.1987 0.78 0.43528
##
## Likelihood ratio test=247 on 24 df, p=0
## n= 12541, number of events= 4195
## (419 observations deleted due to missingness)
coxph(S1~Q6 + mobilityPG24 + Countryb + mobilityPG24:Countryb,data=datos)
## Call:
## coxph(formula = S1 ~ Q6 + mobilityPG24 + Countryb + mobilityPG24:Countryb,
## data = datos)
##
##
## coef exp(coef) se(coef) z p
## Q6Engineering and Technology 0.0591 1.0608 0.1156 0.51 0.60949
## Q6Humanities 0.0398 1.0406 0.1192 0.33 0.73878
## Q6Medical Sciences -0.1920 0.8253 0.1205 -1.59 0.11101
## Q6Natural Sciences -0.0734 0.9292 0.1155 -0.64 0.52493
## Q6Social Sciences 0.0896 1.0937 0.1156 0.77 0.43869
## mobilityPG24 -0.2529 0.7766 0.0672 -3.77 0.00017
## Countrybbelgium -0.3682 0.6920 0.1383 -2.66 0.00774
## Countrybfrance 0.1060 1.1119 0.0715 1.48 0.13828
## Countrybgermany -0.5453 0.5797 0.0807 -6.76 1.4e-11
## Countrybitaly -0.2068 0.8132 0.0610 -3.39 0.00069
## Countrybnetherlands -0.0904 0.9136 0.0833 -1.08 0.27804
## Countrybpoland -0.2235 0.7997 0.1306 -1.71 0.08715
## Countrybspain 0.1787 1.1956 0.0732 2.44 0.01464
## Countrybsweden -0.2989 0.7416 0.0722 -4.14 3.4e-05
## Countrybswitzerland -0.4137 0.6612 0.1746 -2.37 0.01785
## mobilityPG24:Countrybbelgium -0.1329 0.8756 0.2073 -0.64 0.52157
## mobilityPG24:Countrybfrance -0.0917 0.9124 0.1238 -0.74 0.45893
## mobilityPG24:Countrybgermany 0.2351 1.2651 0.1248 1.88 0.05967
## mobilityPG24:Countrybitaly 0.3362 1.3996 0.1046 3.21 0.00131
## mobilityPG24:Countrybnetherlands 0.0111 1.0111 0.1276 0.09 0.93085
## mobilityPG24:Countrybpoland 0.2584 1.2948 0.2239 1.15 0.24854
## mobilityPG24:Countrybspain -0.2305 0.7941 0.1217 -1.89 0.05828
## mobilityPG24:Countrybsweden 0.0720 1.0747 0.1133 0.64 0.52473
## mobilityPG24:Countrybswitzerland 0.1550 1.1677 0.1987 0.78 0.43528
##
## Likelihood ratio test=247 on 24 df, p=0
## n= 5885, number of events= 4195
## (156 observations deleted due to missingness)
mod.gt <- coxph(Surv(start,time,status)~Q6 + mobilityPG24 + Countryb + mobilityPG24:Countryb + mobilityPG24:gt + Q6:gt+ Countryb:gt,data=datos.bis)
ph_fit <- cox.zph(mod.gt, transform="rank")
round(ph_fit$table,3)
## rho chisq p
## Q6Engineering and Technology 0.006 0.155 0.694
## Q6Humanities -0.007 0.211 0.646
## Q6Medical Sciences 0.007 0.235 0.628
## Q6Natural Sciences 0.022 2.101 0.147
## Q6Social Sciences -0.010 0.410 0.522
## mobilityPG24 0.055 12.592 0.000
## Countrybbelgium 0.004 0.074 0.786
## Countrybfrance 0.066 18.608 0.000
## Countrybgermany 0.041 6.810 0.009
## Countrybitaly 0.086 30.004 0.000
## Countrybnetherlands -0.006 0.180 0.671
## Countrybpoland -0.020 1.816 0.178
## Countrybspain -0.011 0.486 0.486
## Countrybsweden 0.015 0.964 0.326
## Countrybswitzerland 0.002 0.015 0.902
## mobilityPG24:Countrybbelgium -0.015 0.926 0.336
## mobilityPG24:Countrybfrance -0.005 0.104 0.747
## mobilityPG24:Countrybgermany -0.035 5.094 0.024
## mobilityPG24:Countrybitaly -0.072 22.334 0.000
## mobilityPG24:Countrybnetherlands 0.006 0.139 0.709
## mobilityPG24:Countrybpoland -0.002 0.012 0.913
## mobilityPG24:Countrybspain 0.020 1.624 0.203
## mobilityPG24:Countrybsweden -0.024 2.394 0.122
## mobilityPG24:Countrybswitzerland -0.001 0.005 0.943
## mobilityPG24:gt2 -0.025 2.535 0.111
## Q6Engineering and Technology:gt2 -0.010 0.412 0.521
## Q6Humanities:gt2 -0.003 0.034 0.853
## Q6Medical Sciences:gt2 -0.010 0.435 0.510
## Q6Natural Sciences:gt2 -0.022 2.016 0.156
## Q6Social Sciences:gt2 0.001 0.008 0.927
## Countrybbelgium:gt2 0.001 0.005 0.946
## Countrybfrance:gt2 -0.029 3.345 0.067
## Countrybgermany:gt2 -0.010 0.393 0.530
## Countrybitaly:gt2 -0.036 5.410 0.020
## Countrybnetherlands:gt2 0.000 0.000 1.000
## Countrybpoland:gt2 0.022 2.068 0.150
## Countrybspain:gt2 -0.005 0.106 0.745
## Countrybsweden:gt2 -0.011 0.492 0.483
## Countrybswitzerland:gt2 -0.006 0.158 0.691
## GLOBAL NA 144.034 0.000
¿Estratificar por mobility? no tendría sentido, perderíamos como cuantificar mobility
mod.cph.test <- coxph(S1 ~ Q6 +mobilityPG24 + Countryb + mobilityPG24:Countryb ,
data = datos)
ph_fit <- cox.zph(mod.cph.test, transform="rank", global=FALSE)
round(ph_fit$table,3)
## rho chisq p
## Q6Engineering and Technology -0.026 2.823 0.093
## Q6Humanities -0.047 9.256 0.002
## Q6Medical Sciences -0.005 0.097 0.755
## Q6Natural Sciences 0.005 0.121 0.728
## Q6Social Sciences -0.049 10.294 0.001
## mobilityPG24 0.050 10.488 0.001
## Countrybbelgium 0.027 3.079 0.079
## Countrybfrance 0.052 11.593 0.001
## Countrybgermany 0.091 34.565 0.000
## Countrybitaly 0.116 55.819 0.000
## Countrybnetherlands -0.018 1.416 0.234
## Countrybpoland -0.046 8.901 0.003
## Countrybspain -0.028 3.229 0.072
## Countrybsweden 0.040 6.603 0.010
## Countrybswitzerland 0.015 0.958 0.328
## mobilityPG24:Countrybbelgium -0.017 1.277 0.258
## mobilityPG24:Countrybfrance -0.009 0.324 0.569
## mobilityPG24:Countrybgermany -0.038 5.974 0.015
## mobilityPG24:Countrybitaly -0.075 23.826 0.000
## mobilityPG24:Countrybnetherlands 0.008 0.247 0.619
## mobilityPG24:Countrybpoland 0.001 0.003 0.955
## mobilityPG24:Countrybspain 0.020 1.666 0.197
## mobilityPG24:Countrybsweden -0.025 2.688 0.101
## mobilityPG24:Countrybswitzerland -0.003 0.027 0.870
dat1
## Q6 mobilityPG24 Countryb
## 1 Social Sciences 0 spain
## 2 Social Sciences 1 spain
predict(mod.cph.test, dat1, type="risk")
## 1 2
## 1.624184 1.001650
summary(mod.cph.test)
## Call:
## coxph(formula = S1 ~ Q6 + mobilityPG24 + Countryb + mobilityPG24:Countryb,
## data = datos)
##
## n= 5885, number of events= 4195
## (156 observations deleted due to missingness)
##
## coef exp(coef) se(coef) z
## Q6Engineering and Technology 0.05907 1.06085 0.11564 0.511
## Q6Humanities 0.03976 1.04057 0.11925 0.333
## Q6Medical Sciences -0.19196 0.82534 0.12045 -1.594
## Q6Natural Sciences -0.07344 0.92919 0.11552 -0.636
## Q6Social Sciences 0.08955 1.09368 0.11564 0.774
## mobilityPG24 -0.25287 0.77657 0.06716 -3.765
## Countrybbelgium -0.36824 0.69195 0.13828 -2.663
## Countrybfrance 0.10604 1.11187 0.07154 1.482
## Countrybgermany -0.54527 0.57968 0.08068 -6.758
## Countrybitaly -0.20681 0.81318 0.06096 -3.392
## Countrybnetherlands -0.09036 0.91360 0.08330 -1.085
## Countrybpoland -0.22346 0.79974 0.13063 -1.711
## Countrybspain 0.17865 1.19561 0.07318 2.441
## Countrybsweden -0.29893 0.74161 0.07217 -4.142
## Countrybswitzerland -0.41366 0.66122 0.17463 -2.369
## mobilityPG24:Countrybbelgium -0.13287 0.87557 0.20732 -0.641
## mobilityPG24:Countrybfrance -0.09170 0.91238 0.12382 -0.741
## mobilityPG24:Countrybgermany 0.23511 1.26505 0.12484 1.883
## mobilityPG24:Countrybitaly 0.33619 1.39960 0.10462 3.213
## mobilityPG24:Countrybnetherlands 0.01108 1.01114 0.12763 0.087
## mobilityPG24:Countrybpoland 0.25837 1.29481 0.22391 1.154
## mobilityPG24:Countrybspain -0.23049 0.79415 0.12172 -1.894
## mobilityPG24:Countrybsweden 0.07204 1.07470 0.11326 0.636
## mobilityPG24:Countrybswitzerland 0.15504 1.16771 0.19872 0.780
## Pr(>|z|)
## Q6Engineering and Technology 0.609491
## Q6Humanities 0.738785
## Q6Medical Sciences 0.111006
## Q6Natural Sciences 0.524933
## Q6Social Sciences 0.438692
## mobilityPG24 0.000166 ***
## Countrybbelgium 0.007744 **
## Countrybfrance 0.138276
## Countrybgermany 1.40e-11 ***
## Countrybitaly 0.000693 ***
## Countrybnetherlands 0.278042
## Countrybpoland 0.087152 .
## Countrybspain 0.014640 *
## Countrybsweden 3.44e-05 ***
## Countrybswitzerland 0.017848 *
## mobilityPG24:Countrybbelgium 0.521569
## mobilityPG24:Countrybfrance 0.458928
## mobilityPG24:Countrybgermany 0.059665 .
## mobilityPG24:Countrybitaly 0.001312 **
## mobilityPG24:Countrybnetherlands 0.930847
## mobilityPG24:Countrybpoland 0.248545
## mobilityPG24:Countrybspain 0.058283 .
## mobilityPG24:Countrybsweden 0.524725
## mobilityPG24:Countrybswitzerland 0.435283
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## Q6Engineering and Technology 1.0608 0.9426 0.8457 1.3307
## Q6Humanities 1.0406 0.9610 0.8237 1.3145
## Q6Medical Sciences 0.8253 1.2116 0.6518 1.0451
## Q6Natural Sciences 0.9292 1.0762 0.7409 1.1653
## Q6Social Sciences 1.0937 0.9143 0.8719 1.3719
## mobilityPG24 0.7766 1.2877 0.6808 0.8858
## Countrybbelgium 0.6920 1.4452 0.5277 0.9074
## Countrybfrance 1.1119 0.8994 0.9664 1.2792
## Countrybgermany 0.5797 1.7251 0.4949 0.6790
## Countrybitaly 0.8132 1.2297 0.7216 0.9164
## Countrybnetherlands 0.9136 1.0946 0.7760 1.0756
## Countrybpoland 0.7997 1.2504 0.6191 1.0331
## Countrybspain 1.1956 0.8364 1.0358 1.3800
## Countrybsweden 0.7416 1.3484 0.6438 0.8543
## Countrybswitzerland 0.6612 1.5123 0.4696 0.9311
## mobilityPG24:Countrybbelgium 0.8756 1.1421 0.5832 1.3145
## mobilityPG24:Countrybfrance 0.9124 1.0960 0.7158 1.1630
## mobilityPG24:Countrybgermany 1.2651 0.7905 0.9905 1.6158
## mobilityPG24:Countrybitaly 1.3996 0.7145 1.1401 1.7182
## mobilityPG24:Countrybnetherlands 1.0111 0.9890 0.7874 1.2985
## mobilityPG24:Countrybpoland 1.2948 0.7723 0.8349 2.0082
## mobilityPG24:Countrybspain 0.7941 1.2592 0.6256 1.0081
## mobilityPG24:Countrybsweden 1.0747 0.9305 0.8608 1.3418
## mobilityPG24:Countrybswitzerland 1.1677 0.8564 0.7910 1.7238
##
## Concordance= 0.6 (se = 0.006 )
## Rsquare= 0.041 (max possible= 1 )
## Likelihood ratio test= 247.2 on 24 df, p=0
## Wald test = 246.9 on 24 df, p=0
## Score (logrank) test = 252.1 on 24 df, p=0
anova(mod.cph.test)
## Analysis of Deviance Table
## Cox model: response is S1
## Terms added sequentially (first to last)
##
## loglik Chisq Df Pr(>|Chi|)
## NULL -33037
## Q6 -33012 51.533 5 6.726e-10 ***
## mobilityPG24 -32990 43.268 1 4.774e-11 ***
## Countryb -32928 124.650 9 < 2.2e-16 ***
## mobilityPG24:Countryb -32914 27.701 9 0.00107 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1